On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

نویسندگان

Lai Wei

Vaibhav Srivastava

چکیده

We study the non-stationary stochastic multiarmed bandit (MAB) problem and propose two generic algorithms, namely, the limited memory deterministic sequencing of exploration and exploitation (LM-DSEE) and the SlidingWindow Upper Confidence Bound# (SW-UCB#). We rigorously analyze these algorithms in abruptly-changing and slowlyvarying environments and characterize their performance. We show that the expected cumulative regret for these algorithms under either of the environments is upper bounded by sublinear functions of time, i.e., the time average of the regret asymptotically converges to zero. We complement our analytic results with numerical illustrations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Policies for a Class of Restless Multiarmed Bandit Scheduling Problems with Applications to Sensor Management

Consider the Markov decision problems (MDPs) arising in the areas of intelligence, surveillance, and reconnaissance in which one selects among different targets for observation so as to track their position and classify them from noisy data [9], [10]; medicine in which one selects among different regimens to treat a patient [1]; and computer network security in which one selects different compu...

متن کامل

On the Optimal Reward Function of the Continuous Time Multiarmed Bandit Problem

The optimal reward function associated with the so-called "multiarmed bandit problem" for general Markov-Feller processes is considered. It is shown that this optimal reward function has a simple expression (product form) in terms of individual stopping problems, without any smoothness properties of the optimal reward function neither for the global problem nor for the individual stopping probl...

متن کامل

Index Policies for Discounted Bandit Problems with Availability Constraints

A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. TheWhittle index policy is derived, and its properties are studied. Then it is assumed ...

متن کامل

Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching - Automatic Control, IEEE Transactions on

We consider multiarmed bandit problems with switching cost, define uniformly good allocation rules, and restrict attention to such rules. We present a lower bound on the asymptotic performance of uniformly good allocation rules and construct an allocation scheme that achieves the bound. We discover that despite the inclusion of a switching cost the proposed allocation scheme achieves the same a...

متن کامل

An approach for handling risk and uncertainty in multiarmed bandit problems

An approach is presented to deal with risk in multiarmed bandit problems. Specifically, the well known exploration-exploitation dilemma is solved from the point of view of maximizing an utility function which measures the decision maker’s attitude towards risk and uncertain outcomes. A link with the preference theory is thus established. Simulations results are provided for in order to support ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1802.08380 شماره

صفحات -

تاریخ انتشار 2018

On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

نویسندگان

چکیده

منابع مشابه

Optimal Policies for a Class of Restless Multiarmed Bandit Scheduling Problems with Applications to Sensor Management

On the Optimal Reward Function of the Continuous Time Multiarmed Bandit Problem

Index Policies for Discounted Bandit Problems with Availability Constraints

Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching - Automatic Control, IEEE Transactions on

An approach for handling risk and uncertainty in multiarmed bandit problems

عنوان ژورنال:

اشتراک گذاری